Diversity, disparity, and evolutionary rate estimation 
for unresolved Yule trees 

Forrest W. Crawford and Marc A. Suchard 
Typeset July 23, 2012 

Abstract 

The branching structure of biological evolution confers statistical dependencies on phenotypic 
trait values in related organisms. For this reason, comparative macroevolutionary studies usually 
begin with an inferred phylogeny that describes the evolutionary relationships of the organisms 
of interest. The probability of the observed trait data can be computed by assuming a model 
for trait evolution, such as Brownian motion, over the branches of this fixed tree. However, 
the phylogenetic tree itself contributes statistical uncertainty to estimates of other evolution- 
ary quantities, and many comparative evolutionary biologists regard the tree as a nuisance 
parameter. In this paper, we present a framework for analytically integrating over unknown 
phylogenetic trees in comparative evolutionary studies by assuming that the tree arises from 
a continuous-time Markov branching model called the Yiilc process. To do this, wc derive a 
closed-form expression for the distribution of phylogenetic diversity, which is the sum of branch 
lengths connecting a set of taxa. We then present a generalization of phylogenetic diversity 
which is equivalent to the expected trait disparity in a set of taxa whose evolutionary relation- 
ships are generated by a Yule process and whose traits evolve by Brownian motion. We derive 
expressions for the distribution of expected trait disparity under a Yule tree. Given one or more 
observations of trait disparity in a clade, we perform fast likelihood-based estimation of the 
Brownian variance for unresolved clades. Our method does not require simulation or a fixed 
phylogenetic tree. We conclude with a brief example illustrating Brownian rate estimation for 
thirteen families in the Mammalian order Carnivora, in which the phylogenetic tree for each 
family is unresolved. 

Keywords: Brownian motion. Comparative method, Markov reward process, Phylogenetic 
diversity. Pure-birth process. Quantitative trait evolution. Trait disparity. Yule process 



1 Introduction 



Evolutionary relationships between organisms induce statistical dependencies in their phenotypic 



traits ( Felsenstein , 1985). Closely related species that have been evolving separately for only a 
short time will generally have similar trait values, and species whose most recent common ancestor 



is more distant will often have dissimilar trait values (Harvey and Pagel, 1991). However, the 



origins of phenotypic diversity are still poorly understood (Eldredge and Gould, 1972; Gould and 
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can give rise to highly varying phenotype values (Foote 1993 Sidlauskas, 2007), and researchers 
disagree about the relative importance of time, the rate of speciation, and the rate of phenotypic 



evolution in generating phenotypic diversity (Ricklefs, 2004 Purvis, 2004 Ricklefs, 2006). 

Comparative phylogenetic studies seek to explain phenotypic differences between groups of taxa, 
and stochastic models of evolutionary change have assisted in this task. Researchers often treat 
phenotypic evolution as a Brownian motion process occurring independently along the branches of 



a fixed macroevolutionary tree (Felsenstein, 1985). In comparative studies, the Brownian motion 
model of trait evolution has a convenient consequence: given an evolutionary tree topology and 
branching times, the trait values at the concurrently observed tips of the tree are distributed ac- 
cording to a multivariate normal random variable. Brownian motion on a fixed phylogenetic tree 
is the basis for the most popular regression-based methods for comparative inference and hypoth- 



esis testing (Grafen, 1989 Garland et al 1992 Martins and Hansen, 1997 Blomberg et al, 2003 
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of interest becomes a two-step process: first, one must infer a phylogenetic tree; then, conditional 
on that tree, one estimates relevant evolutionary parameters, usually by maximizing likelihood of 
the observed trait data under the model for trait evolution. Unfortunately, the uncertainty involved 
in estimating the tree propagates into the comparative analysis in a way that is difficult to account 



for (but see Stone (2011)), and comparative researchers often lack a precise phylogenetic tree on 
which to base a regression analysis of trait data. Modern techniques for dealing with this issue gen- 
erally resort to simulation. Some researchers simulate a large number of possible trees and estimate 
parameters conditional on a single representative tree, such as the maximum clade credibility tree 



(see, for example, Alfaro et al (2009)). Alternative approaches that utilize simultaneous simulation 
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of trees and parameters via Bayesian methods are gaining in popularity (Sidlauskas, 2007; Slater 



et al 2012; Drummond et al, 2012). 



However, simulation methods can be extremely slow and may require assumptions about prior 
distributions of unknown parameters that are difficult to justify. Indeed, in macroevolutionary 
studies, the phylogenetic tree is often not of interest per se, but must be taken into account in 
order to accurately model the dependency of the traits under consideration. Many comparative 
phylogeneticists regard the evolutionary tree as a nuisance parameter in the larger evolutionary 
statistical model. For this reason, there is increased interest in tree-free methods of comparative 
analysis that preserve information about the variance of phenotypic values within unresolved clades 



(Bokma, 2010). 



To develop a method for comparative inference in evolutionary studies that does not rely on a 
particular tree, it is convenient to specify a generative model for phylogenetic trees. In the Yule 
(pure-birth) process, every existing species independently gives birth with instantaneous rate A; 



when there are n species, the total rate of speciation is n\ (Yule, 1925). The Yule process is 



widely used as a null model in evolutionary hypothesis testing and can provide a plausible prior 



distribution on the space of evolutionary trees in Bayesian phylogenetic inference (Nee et al, 1994 



Rannala and Yang, 1996; Nee, 2006). One can easily derive finite-time transition probabilities 



(Bailey, 1964), and efficient methods exist to simulate samples from the distribution of Yule trees. 



conditional on tree age, number of species, or both (Stadler, 2011). Interestingly, some researchers 



have pointed out that even the simple Yule process can have unexpected properties that may be 



relevant in evolutionary theory and reconstruction ( Gernhard et al 2008 Steel and Mooers 2010 ) 



Due to Yule trees' simple Markov branching structure and analytically tractable transition 
probabilities, many researchers have made progress in characterizing summary properties of the 



Yule process - that is, integrating over all Yule tree realizations. For example. Steel and McKenzie 



(2002) study aspects of the shape of phylogenies under the Yule model, such as the distribution 



of the number of edges separating a subset of the extant taxa from the MRCA; Gernhard et al 



(2008) find distributions of branch lengths; Steel and Mooers (2010) study the expected length of 



pendant and interior edges of Yule trees; and Steel and McKenzie (2001) and Mulder (2011) study 



the distribution of the number of internal nodes separating taxa. 

One important summary statistic for trees in biodiversity applications is phylogenetic diversity 



(PD), defined as the sum of all branch lengths in the minimum spanning tree connecting a set 



of taxa (Faith 1992). Applied researchers in evolutionary biology have found PD to be useful 



in conservation and biodiversity applications; see, e.g., Webb et al (2002), Moritz (2002), and 



Turnbaugh et al (2008). PD also has the virtue of being a mathematically tractable statistic for 



phylogenetic trees, and has attracted interest from researchers interested in its properties. For 



example, Faller et al (2008) show that the asymptotic distribution (as the number of taxa n — t- oo) 
of PD is normal and give a recursion for computing the distribution of PD where edge lengths 



are integral. Mooers et al (2011) discuss branch lengths on Yule trees and expected loss of PD 



in conservation applications. Most importantly for our study, Stadler and Steel (2012) find the 
moment-generating function for PD conditional on n extant taxa and tree age t under the Yule 
model. Following on these inspiring results, we seek now to study analytic properties of Yule trees 
that are useful for comparative evolutionary studies. 

In this paper, we present a framework for computing probability distributions related to diversity 
and quantitative trait evolution over unresolved Yule trees and describe methods for estimating 
related parameters. We first give a mathematical description of the Yule model of speciation and 
briefly discuss its properties. Next, we introduce the Markov reward process, a probabilistic method 
for deriving probability distributions related to the accumulation of diversity under a Yule model. 
In Theorem 1, we give an expression for the probability distribution of PD under a Yule model, 
conditional on the number of species n, time to the most recent common ancestor (TMRCA) or 
tree age t, and speciation rate A. We then demonstrate an important and previously unappreciated 



relationship between trait disparity, the sample variance for a group of taxa (O'Meara et al, 2006), 
and PD for traits evolving on a Yule tree via Brownian motion. Theorem 2 gives an expression 
for the distribution of expected trait disparity when integrating over the branch lengths of a Yule 
tree. Next, we describe a statistical method for performing fast maximum likelihood estimation of 
Brownian variance, given an unresolved clade and observed trait disparity. Our approach does not 
require fixing a phylogenetic tree or specification of prior probabilities for unknown parameters. The 
method is simulation-free and does not seek to infer branch lengths or ancestral states. We show 
empirically that our estimators are asymptotically consistent. We conclude with an application of 
our method to body size evolution in the Mammalian order Carnivora. 
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2 Mathematical background 

To aid in exposition, we briefly establish some notation. Denote the topology of a phylogenetic tree 
by T. A topology is the shape of a tree, disregarding branch lengths or age. We always condition 
our calculations on the phylogenetic tree having age t with n extant taxa. Let ti,t2, ■ ■ ■ denote the 
branching points of a tree, where tk is the time of branching from k to k + 1 lineages. We measure 
time in the forward direction, so at the TMRCA, t = 0. This is in keeping with our mechanistic 
orientation: the Yule process, to be developed below, runs forward in time from to t. 

2.1 Yule processes 

Let Y{t) € {1, 2, . . .} be a Yule process with birth rate A that keeps track of the number of species 
at time t. The transition probabilities Pmni't) = Pr(y(t) = n | Y{{)) = m) satisfy the Kolmogorov 
forward equations 

— ji — = -XPmi{t), and 

(1) 

= -XnPranit) + (n - l)AP^,._i(t) 

for n > 1. This infinite system of ordinary differential equations can be solved to yield closed forms 
for the finite-time transition probabilities, 

Pmn{t) = (''~\)e-"'''\l-e-^'r-"', (2) 
\m — 1 J 



which have a negative binomial form (Bailey 1964). In the Yule process, we are only concerned 



with the number of species that exist at any moment in time, not their genealogy. That is, we 
assume that the lineage that branches is chosen uniformly from all extant lineages. The transition 
probability ^ is useful for performing statistical inference: suppose we know the branching rate A 
and the age t of a tree, and we observe Y{t) = n. Then ^ gives the likelihood of our observation. 
Figure [T] shows an example realization of a Yule tree, with the corresponding counting process 
diagram below. In this example, A = 2, Y{0) = 2, and Y{t = 1) = 12. 

[Figure 1 about here.] 
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2.2 Markov reward processes 

In a Markov reward process, a non- negative reward accrues for each unit of time a Markov 



process spends in state k (Neuts, 1995). Consider a Yule process Y{s) beginning at 1^(0) = 1 and 



ending at Y{t) = n. The accumulated reward up to time t is 



Rt= I «y(s) ds. 
/o 



(3) 



When Y{s) is observed continuously from time to t, the process ay^s) is a fully-observed step 
function, and Rt can be easily computed as the area under that function. To illustrate, suppose 
that the process makes jumps at times ti, . . . ,tn-i-, and we define to = and t„ = t. We assume 
Y{s) is right-continuous, so Y{ti) = i + 1. Then at time t, the accumulated reward is 



Rt — ^ — ti-i) — ^ ai{ti — ti-i). 



(4) 



i=i 



i=l 



When only l'(O) and Y{t) are observed, it can be challenging to compute the distribution of Rt- 



In our proofs of Theorems 1 and 2, we appeal to the method developed by Neuts (1995) and Minin 



and Suchard (2008) to find reward probabilities conditional on 1^(0) and Y{t). Let 



Vmn{x,t) = Vi{Rt = x,Y{t) = n I y(0) = m) 



(5) 



be the joint probability that the reward at time t is x and the process is in state n, given that 
the process began in state m at time 0. This joint probability formulation is more mathematically 
convenient than the more natural conditional probability, as we demonstrate in the proofs of the 
Theorems. However, it is easy to transform Vmn{x,t) into the conditional probability via Bayes' 
rule, as we show below. Appendix [A] gives a preliminary lemma deriving a representation for Yule 
reward processes that will be useful in proving the Theorems that follow. 

3 The distribution of phylogenetic diversity in a Yule process 

The Yule process is a simple and analytically tractable mechanistic model for producing the birth 
times of a clade. If we assume that the species that undergoes speciation is chosen randomly 
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from the extant species at that time, then the Yule process is also a distribution over bifurcating 
trees of age t. The Markov rewards framework provides a technique to understand integrals over 
Yule processes in precisely this context. In this section, we study the distribution of PD in trees 
generated by a Yule process. PD depends only on the branching times, and not the underlying 
topology, of the phylogenetic tree, making it a suitable first step in our goal of integrating over 
trees in comparative studies. 

To proceed, let Y{s) be a Yule process with branching rate A that keeps track of the number 
of lineages at time s. We seek an expression for PD, the total branch length of the tree, which is 
equivalent to the area under the trajectory of the counting process Y(s). Define a Markov reward 
process with Y{s) and a/. = k for k = 1,2,.... Then 



Rt 



ay(s) 



ds 



Y{s) ds. 



(6) 



We now state our first Theorem giving an expression for the distribution of Rt in a Yule process. 
Theorem 1. For a Yule process with birth rate X, starting at Y{0) = m and ending at Y{t) = n, 



5{x — mt)e 



-mXt 



m = n 



-\x " 



(n — m — V)\ ^ \ i — \ j \m — \ 

^ ' j=m 



n — l\ f j — 1 



(7) 



{-iy-"'{x-jt)''-"'-'H{x-jt) n>m 



where 5{x) is the Dirac delta function and H{x) is the Heaviside step function. 

The proof of this Theorem is given in Appendix IB| There has been disagreement about whether 



the definition of PD in different contexts includes the root lineage (Faith 1992 Faith and Baker 



2006 


Crozier et al 


2006 


Faith 


2006 



the stem age of an unresolved clade, then taking ai = 1 in ([3]) includes the root in the distribution 
of accumulated PD, and ai = does not. The form of ([T]) will change slightly if m = 1 and ai = 0. 
The probability distribution of PD, conditional on Y{0) = m and Y{t) = n, is 



/y(x I m,n,t, A) 



Pmn ip) 



(8) 
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where Pmn{t) is the Yule transition probabihty Q. This family of probability distributions has 
some interesting properties. Figure [2] shows fY{x\m, n, t, A) for m = 1 (with ai = 1), n = 1, . . . , 8, 
A = 1.2, and t = 1. The unusual shape of the distribution for smaller n demonstrates the piecewise 
nature of the density, apparent in the functional form ([T]). Interestingly, Faller et al (2008) show 



that the distribution PD tends toward a normal distribution as n — )• 00, a fact suggested by the 
shape of the distributions in Figure [2] 

[Figure 2 about here.] 

These distributions have some practical uses. First, one can predict the PD that will arise 
under the Yule model from a collection of extant species up to time t in the future. Second, we 
can calculate the probability that future PD at time t in one group is greater than in the other, 
conditional on the number of species and diversification rates in both groups; this probability may 
have uses in conservation applications. Third, conditional on an inferred phylogenetic tree for a 
set of n taxa, one could compute the resulting PD x and perform a hypothesis test to evaluate the 
Yule-PD model using the quantity 

/•nt 

Pr(PD >x)= frix) dx (9) 

Jx 

where /y(x) is given by ^ and nt is the maximum PD that can accumulate in time t, conditional 
on Y{t) = n. 



4 The distribution of expected phenotypic variance 

Since researchers generally do not know the phylogenetic tree for a set of species with certainty, 
PD is not observable until after a tree has been estimated. Unfortunately, the uncertainty involved 
in estimating a tree propagates into subsequent estimates of PD based on that tree, and our 
distributional results may no longer apply. We therefore seek a distribution for an analogous 
quantity that is observable directly from knowledge of the number of species n and their trait 
values, bypassing the need to infer a detailed phylogenetic tree. For this, we will need a model for 
phenotypic trait evolution on the branches of an unknown phylogenetic tree generated by a Yule 
process. 
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The simplest and most popular model for evolution of continuous phenotypic traits on phylo- 



genetic trees is Brownian motion ( Felsenstein , 1985). Under this model, trait increments over a 
branch of length t are normally distributed with mean and variance a'^t. The trait values for ex- 
tant species at the present time are observed as the vector X = {Xi, . . . , For a given topology 
T with n taxa and branching times t = {t2, ■ ■ ■ ,tn-i), with tip data X generated on the branches 
of this tree by zero- mean Brownian motion with variance o"^, the tip data are distributed according 
to a multivariate normal random variable. More formally, 



X ~ N{0,a^C{T,t)), 



(10) 



where the entries of the variance-covariance matrix C(r, t) = {cij} are defined as follows: cu = t, 



and Cij is the time of shared ancestry for taxa i and j, where i ^ j- O'Meara et al (2006) introduces 
disparity, the sample variance of the tip data X, 



disparity(X) = -(X - X)'(X - X), 
n 



(11) 



where X is the mean of the elements of X. The expectation of the disparity, conditional on the 
tree topology r, branching times t, and the Brownian variance cr^, is 

Ex(disparity | T,t,a^) = -Ex((X - X)'(X - X) | T,t,a^) 



n 



a 



a 



a 



a 



n 



t- ^l'C(T,t)l 



i=l j<i 



n , 



i=l j<i 



(12) 



where we use the notation Ex to indicate that the expectation is taken over realizations of the 
Brownian process that generates X. The fourth line above arises since the matrix C(r, t) is sym- 
metric and every element on the diagonal is t. However, every entry aj is either zero or a branching 
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time from the vector t, so the nonzero terms in the sum consist of branching times t^. Let be the 



coefficient multiplying the branching time tk in the last line of (12). Then we can express expected 
disparity as a weighted sum of the branching times, 



Ex(disparity | r, t, ct^) = ct^ ^ Zktk- 



(13) 



k=2 



Figure [3] illustrates how tree topology determines the matrix C(r, t) and expected disparity. 



[Figure 3 about here.] 



4.1 Expected disparity as an accumulated reward 



The expected disparity (13) has features in common with PD, since it is a scalar quantity that 



accumulates over the branches of the tree from time to t. The difference is that disparity implicitly 



incorporates tree-topological factors, which enter (13) as weights in the sum of the branch lengths 



In addition, a Yule tree accumulates PD even when there is a single lineage, but the same is not 
true for disparity. We develop these ideas in greater detail in this section. 

Our goal is to express (13) as a Markov reward process in a form equivalent to (Q, 



Rt = cr^ ^ ak{tk - tk-i). 



(14) 



k=l 



From (12) we see that 



^1 — and a„_i can be found using a„ and z„_i, and so on. 



We can formalize this recursive solution for the rewards by equating (13) and (14) as follows: 



n-l 



Zktk = ^ ak{tk — tk-i) = a-ntn + ^(ofc — ak+i)tk- 



(15) 



k=2 



k=l 



k=l 



Then recursively solving for the a^s gives ai = and 



j=k 



(16) 



for k = 2, . . . ,n. Now defining i?t(a) to be the Yule reward process with rewards a = (ai, . . . , an) 
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under the topology r, the expected disparity is distributed as 



Ex(disparity | r, A, cr^,n) ~ a'^Rt{a) 



(17) 



Note that we no longer need to condition on the branch lengths t = (ti, . . . , tn) in the expected 
disparity - they have been "integrated out". Therefore, to find the distribution of expected trait 
variance under a Brownian motion process on a Yule tree with topology r, we need only find the 
relevant rewards a and compute the corresponding distribution of i2t(a). 

As a concrete example, consider the five-taxon tree in Figure |3j The expected disparity, given 
this topology r and arbitrary branch lengths t = {t2,t3,ti), is 



E(disparity|T, t, o"^) = o"^ 



4 4 2 2 ■ 

— t — to — tp. — td 

5 25 25 25 



(18) 



The coefficients are given by 



Solving for the rewards a, we obtain 



4 2 2 4 
25' ~25' ~25' 5 



(19) 



^ 12 16 18 4 

'25' 25' 25' 5 



(20) 



which is easily verified by hand. This leads us to our second Theorem, which gives an expression 
for the distribution of i?f(a). 

Theorem 2. In a Yule process with rate A and arbitrary rewards a = {ai, . . . ,an), the Laplace 
transform ofvmn{x,t\a) is given by 



A 



(m 



{X{k- j) +r{ak - aj)) 



m = n, and 



n > m. 



(21) 



The proof of this Theorem is given in Appendix [O To obtain the probability distribution of the 
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accumulated reward, we must invert (21), 



Vmn{x,t)=^-^[fmn{r,t)]{x). (22) 

For m = n and n = m + 1, there are simple expressions for the inverse Laplace transform. Under 



certain conditions on the rewards a, there is a straightforward analytic inversion of (21 ) for general 



n > m, but the rewards computed using the times of shared ancestry in a phylogenetic tree do 



not always satisfy these conditions. Therefore, it is often easier to numerically invert (21); we 



discuss this issue in much greater detail in Appendix [Dl and provide a straightforward method for 



numerical inversion of the Laplace transform (21) based on the method popularized by Abate and 



Whitt ( 1995 ) 



4.2 Approximate likelihood and inference for 

We now describe a statistical procedure for using Theorem [2] to perform statistical for the unknown 
Brownian variance a^. Suppose that in a clade of n species we have a crude tree topology r (without 
branch lengths). This topology could be derived from parsimony, distance-based tree reconstruc- 
tion methods, or one could simply use family/genus/species information to assign a hierarchy of 
relationships and resolve polytomies randomly. Given the tree topology r, one can compute the re- 
wards a. Suppose also that we have calculated trait disparity for each of J independent continuous 
quantitative traits that arise from Brownian motion on the branches of the unknown phylogenetic 
tree, starting at the root. Let 

= i (x(^) - (X(^) - , (23) 

be the observed disparity for the jth phenotypic trait, where X*^-'^ is the vector of n trait values for 
the jth phenotypic trait and X^^^ is the mean of the elements of X^-') . Then we calculate the mean 
disparity Dn across these J traits: 



Dn = -j^D^r^^. (24) 



Note that in order to find Dn, we do not need the individual trait measurements themselves - 
only the disparities. Then by the law of large numbers, Dn — >■ K{Dn) as J — )■ oo, where Dn is the 
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asymptotic mean disparity across all possible traits. Therefore, we approximate the distribution of 
Dn as follows: 

~ Ex(disparity | r, o"^, A, n, t) (25) 

where a is the vector of rewards obtained from the topology r. This approximate relation provides 
the connection between observable mean trait disparity and the probability distribution in Theorem 
dthat we need in order to compute the probability of the observed disparities. Suppose the stem 
age of an unresolved tree is t, and let 

fYW = (26) 

be the distribution of expected disparity in a Yule process with general rewards a, conditional on 
y(0) = m and Y{t) = n. Here, x is the expected trait disparity, which we approximate by our 



observed (and therefore fixed) D^- From (25), we write 



^ ~ ^t(a) (27) 

so the likelihood is approximately 

fY{Dn/<y^). (28) 
Finally, we propose the approximate maximum likelihood estimator 

= argmax fyiDn/a'^). (29) 

To find (T^, note that in a Yule reward process in which < aj for k < j, the value of the reward 
is constrained to lie in the interval 

amt < -Rt(a) < ant (30) 
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where we have assumed 1^(0) = m and Y{t) = n. Additionahy, when none of the rewards are 



zero we are justified in dividing Dn by the terms in (30) to obtain bounds on the possible value of 
(T^ that maximizes (28), 

(31) 



When m = 1 and ai = 0, the upper bound is infinity. However, in practice when m = 1 and n > 4, 
it is safe to assume that 

Unt a2t 



We solve ( 29 ) using the numerical Newton- Raphson method provided by the R function nlm. 



We emphasize that (29) is not a traditional maximum likelihood estimator. We have approx- 



imated the distribution of Dn by the distribution of expected disparity, giving an approximate 
likelihood (28) that may not attain its maximum at the same value of cr^ as the true likelihood. 



In addition, the density (28) is non-differentiable at several points, and for n = m + 1 attains its 



maximum at the lower boundary i?t(a) = amt (see Figure [2] with m = 1, n = 2). These issues 
complicate application of traditional asymptotic theory for maximum likelihood estimates, and 
the classical large sample theory may not apply because the likelihood is only an approximation. 
One consequence of the violation of these traditional assumptions is that we are unable to provide 



meaningful standard errors for a using only the approximate likelihood (28). 



4.3 Simulations 

To empirically evaluate the correctness of the analytic distributions we derived in Theorem [2| we 
simulated trait data via Brownian motion on trees generated by a Yule process with age t = 1 and 
branching rate A = 1. For n = 3, . . . , 8, we chose one tree topology and simulated A'^btimes = 2000 



sets of branch lengths from a Yule process for n species (Stadler, 2011). For each set of branching 
times, we simulated N-qm = 2000 realizations of Brownian motion with cr^ = 1 to generate trait 



values at the tips of the tree ( Paradis et all |2004[ ). For each of the 2000 sets of tip values, we 
calculated the mean disparity using ( 12 ). Figure[4]shows histograms of the mean disparities with the 
analytic distribution fvi-c) overlaid, with good correspondence. The tree topology (with arbitrary 
branch lengths for display) is shown in gray above each histogram. 
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[Figure 4 about here.] 



To evaluate our estimation methodology, we take a similar approach, but for each simulated 
set of mean disparities, we infer cr^, an approximate maximum likelihood estimate of cr^. Figure [5] 
shows estimates of for different species richness n under different simulation conditions. For n = 
3, . . . , 12, we generated 100 trees, each with A^'btimes = !> 5, 10, 100 sets of branching times. For each 
set of branching times, we evolved A'bm = 1)5, 10, 100 traits by Brownian motion along the branches 
with rate o"^ = 1 and computed the mean disparity. Then, given the A'btimes mean disparities, we 
maximized the approximate likelihood to find ct^. Each dot in Figure [s] represents one estimate, and 
the dots are jittered slightly to show their density. The variance in the estimator is large when the 



number of simulated branching time sets and Brownian realizations is small since (25) and hence 



(27) become poor approximations to the mean disparity. However, the approximate maximum 
likelihood estimator for cj^ appears to have the desirable property of statistical consistency: the 
deviation of the estimates from the true value = 1 goes to zero as the number of mean disparity 
observations becomes large. 



[Figure 5 about here.] 



5 Application to evolution of body size in the order Carnivora 

To illustrate the usefulness of our method in practical comparative inference, we estimate thirteen 
family-wise Brownian variance rates for body size evolution in the mammalian order Carnivora 



using observed log-body size disparities and species richness information (Gittleman 1986, Gittle- 



man and Purvis, 1998 Slater et al, 2012). Carnivora includes members with very large and small 



body masses, including wide diversity within individual families (Nowak and Paradiso, 1999). We 



included only families with 2 or more species, since families with only one species do not reveal 
useful information about intra-family Brownian variance. The dataset, comprising 284 species, in- 
cluded the families Canidae, Eupleridae, Felidae, Herpestidae, Hyaenidae, Mephitidae, Mustelidae, 
Otariidae, Phocidae, Prionodontidae, Procyonidae, Ursidae, and Viverridae. Figure [6] shows the 
backbone phylogeny (from Eizirik et al ( 2010[ )) and the unresolved clades. Our analysis takes advan- 
tage of utilities for manipulating trees and quantitative trait data in the ape package (described in 
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Paradis et al (2004)) and simulating trees with the TreeSim package (described in Stadler (2011)), 



using the statistical programming language R (CRAN, 2012). We intentionally limit our analysis 



to the Brownian variance in each family and use only body size disparity in order to demonstrate 
the simplicity of our approximate method under conditions of very little data. 

[Figure 6 about here.] 

The first step is to estimate the speciation rate A from the backbone tree and the unresolved 
clades. Even though the tree is unobserved within each family, we can still find the exact maximum 
likelihood estimate of A for the tree as a whole. On a fully resolved branch of length t, the log- 
likelihood of A is log(A) — Xt. For an unresolved clade of age t with one species at the crown which 
grows to include n species, the log-likelihood from ^ is proportional to —Xt + (n — 1) log(l — e~^^). 
Summing these partial log-likelihoods over the whole tree gives the log- likelihood for A; maximizing 
this function, we find that A = 0.069 per million years. In what follows, we assume that A = A. 
To each unresolved family clade we associate the clade disparity for body size. Our analysis will 
consist of estimating the family- wise Brownian variance (t| for the jth. family, assuming one species 
at the stem age for each clade shown in Figure |6| In this way, we integrate over crown ages for 
each family under the Yule model. To apply the methodology for modeling expected trait disparity 
developed in section HI we regard the unresolved clades as "soft polytomies" in which branch lengths 



are unknown (Purvis and Garland, 1993). We therefore resolve these polytomies randomly without 
assigning branch lengths. 

Our analysis of the Carnivora family- wise Brownian variances takes approximately 30 seconds 
to run on a laptop computer. However, to evaluate the variability in our estimates, we ran the 
analysis 100 times with randomly resolved polytomies for each family. Table [T] shows the family 
name, species richness (n), stem age (t), observed body size disparity (Dn), the mean estimate of 
Uj, and the approximate standard error of the estimate for each family. We calculated standard 
errors as the empirical standard deviation of the 100 Brownian variance estimates. Our estimates 
of (T^ reveal readily interpretable information about the evolution of body size in each family that 
is not available directly from observed disparities alone. For example, the families Herpestidae 
and Viverridae have almost identical species richness, but quite different disparity measurements. 
Perhaps surprisingly, we have estimated nearly equal Brownian variances for the two families. Why 
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does our method produce such similar estimates of cj^? The answer hes in the ages of the clades - 
Viverridae is almost 50% older than Herpestidae. Two clades with the same richness whose traits 
evolve by Brownian motion at the same rate can exhibit very different disparity measurements, 
depending on their ages. This example illustrates how our method provides an approximate way 
to untangle the complex interaction of time, species, and observed trait variance (in the words of 
Ricklefs ( 2006| )) for unresolved clades. 



[Table 1 about here.] 

6 Discussion: Comparative phylogenetics without trees? 

In this paper we have outlined a method for integrating over Yule trees. We presented an expression 
for the distribution of PD in an unresolved tree, conditional on the number of species n and age t. We 
showed that the expected disparity can be represented in a similar way as PD, since it accumulates 
along the branches of a phylogenetic tree. We also derived a statistical framework that uses a very 
small amount of information (n, t, and Z)„) for an unresolved clade to derive a meaningful estimator 
for the Brownian rate o"^. It may seem counterintuitive that one can estimate the Brownian rate 
for an unresolved tree with n taxa, given a single disparity measurement. However, the structure 
provided by the Yule process allows this inference by providing just enough information about the 
distribution of branching times that generate the tree to model the average phenotypic disparity 
under Brownian motion. This permits analytic integration over two random objects: the collection 
of branching times of the tree and realizations of Brownian motion. Three assumptions make 
this possible: first, we fix a topology r without branch lengths; second, we assume that branch 
lengths come from a Yule process; third, we compute the distribution of expected disparity, which 
is a scalar quantity that encapsulates the most important information in the covariance matrix 
C(t). In exchange for these assumptions, we gain what frustrated reviewers of Felsenstein's paper 
apparently wished for: an estimator that "obviate [s] the need to have an accurate knowledge of 



phylogeny" ( Felsenstein , 1985). Whether these assumptions are warranted depends fundamentally 
on the scientific questions at hand, and the available data. 

Perhaps the most satisfying use of our method is in providing an approximate and model-based 



answer to the questions posed by Ricklefs (2006) and Bokma (2010), in their similarly-named pa- 
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pers. Our answer is approximate because it substitutes an observation for an expectation in (25) 



it is model-based because we assume that trees arise from a Yule process and traits evolve Brow- 



nian motion. Equation ( 25 ) expresses a heuristic relationship explaining the origins of phenotypic 



disparity, which we reproduce here for emphasis: 



(33) 



On the left-hand side is the observed disparity. On the right-hand side, Rt{s() serves as a scalar 
summary of tree shape - it depends only on n, clade age t, and tree topology r. This reveals that 
even when we restrict our attention to expected disparity under the simplest evolutionary models, 
the interaction between n, t and the branching structure of the tree in iit(a) is complex, but the 
Brownian variance simply scales the tree-topological term. We see that Z)„ scales linearly with o"^ 
when n, t, and the topology r are fixed. However, changing one of n, t, or r while holding a'^ 
constant will induce a nonlinear change in We conclude that it is not possible to partition 
the time-dependent and speciation-dependent influences on the accumulation of trait variance in a 



simple way as suggested by Ricklefs (2006) under the stochastic models we study in this paper. 



As an inducement to spur research on analytic integration over trees, Bokma (2010) offers 



a monetary reward for an expression for the distribution of sample variance from a birth-death 
tree with Brownian trait evolution on its branches. We have solved a simpler version of Bokma's 
challenge by providing the distribution of expected trait variance for a specific topology under a 



pure-birth process. The expression Bokma (2010) seeks is difficult to find for two reasons: first. 



it would require analytic integration over discrete tree topologies; second, and more intuitively, 
integrating over both topologies and Brownian realizations would subsume the Brownian variance 



a on the right-hand side of (25 ) into a nonlinear term that depended on n, t, and a in a very com- 



plicated way. Alternatively, simulation-based approaches provide an appealing alternative method 
to integrate over trees and Brownian motions without requiring approximation of the disparity 
by its expectation. Indeed, Bayesian methods exist to sample from the distribution of Yule trees, 
conditional on observed trait values at the tips, thereby providing both estimates of Brownian rates 



and phylogenies simultaneously, while using all the available trait data (Drummond et al, 2012). 



Tree-free comparative evolutionary biology comes at a price - there are several important draw- 
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backs to our approach. First, even under the Yule model for speciation (with the correctly specified 
branching rate A) and zero-mean Brownian motion for traits, integrating over all possible Yule trees 
introduces great uncertainty in estimates of . Figure [s] illustrates this issue: while the estimates 
of fj^ eventually converge to the true value as the number of branch length and trait realizations 
becomes large, the variance in these estimates can be substantial for smaller datasets. Furthermore, 
the assumption that l)„, ~ a^Rt(a) may be suspect if the number of traits analyzed is small enough 
that the mean trait disparity is a poor substitute for the expected disparity. 

We conclude with a mixed message about analytic integration over trees. First, it is possible 
to derive meaningful estimators for parameters of interest under simple evolutionary models, if one 
is willing to make assumptions about the mean behavior of the models. The estimates are usually 
reasonable, and may provide valuable insight into the basic properties of evolutionary change under 
these models - even our simplistic analysis of Carnivora body size evolution reveals the complex 
interaction of clade age, species number, and evolutionary rate. These estimates may be useful 
as starting points for more time-consuming simulation analyses. Second, and more pessimistically, 
sophisticated analytic methods for integrating over trees cannot conjure evolutionary information 
from the data that is not there already. As evolutionary biologists further refine our knowledge of 
the tree of life, the number of clades whose phylogeny is truly unknown may diminish, along with 
interest in tree-free estimation methods. 
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A Markov rewards for Yule processes 

In this Appendix, we prove one lemma and the two Theorems presented in the text. In the 
first proof, we derive a representation of the forward equation for a Yule reward process. Our 
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development follows that given by Neuts (1995). 



Lemma 1. In a Yule reward process Rt = Jq ay^^) ds with arbitrary positive rewards 01,02,..., 
the Laplace-transformed reward probabilities satisfy the ordinary differential equations 

^fin^y^ = -(nA + anr)fmn{r, t) + {n - l)A/^,„_i(r, t). (34) 

Proof. Let = Pr(i?t < x,Y{t) = n \ Y{0) = m). We can re-write this quantity in a more 

useful form by conditioning on the time of departure u from state m, noting that the accumulated 
reward is a^u, and then integrating over u. If m = n and no departure occurs, the accumulated 
reward is Omt- Putting these ideas together, we obtain 

Vmn{x, t) = Fv{Rt < X, Y{s) = m for < s < t) 

+ / Pr(m — )• m + 1 at time u)Vm+i,n{x — OmU, t — u) du (35) 

= 5m„e"'"^*iJ(x - Omt) + / mAe~"'-^"Ki+i,„(x - o^u, t - u) du. 

Jo 

Now consider the Laplace transform Fmnir,t) of Vmn{x,t), with respect to the reward variable x, 

t) = .^[Vmn{x,t)\{r) 

/•CO 

= J e~'''^Vmn{x,t) dx ^3g-) 
+ / mAe"™^" / e^'''^Vm+i,n{x - amU.t - u) dx du. 

Jo J 



~>mn 

r 



Making the substitution y = x — in the Laplace integral, we have 

-(mX+am,'i")t rt pco 



e 

r 

g -{mX+amr)t ft 



Fmnis, t) = 6mn + / mAe"™^" / e-'-(?^+"-") (y, t - u) dy du 

Jo Jo 



(37) 

Smn— — + / mAe-(™^+"'"^)"F^+i,„(r,t-u) du. 



r 







Now multiplying both sides by and differentiating with respect to t, we obtain 



d_ r 
di 



^\ (38) 



d_ 
di 



mAe("^^+'^-'-)"F™+i,„(r,u) du. 
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Expanding the left-hand side by the product rule and using the fundamental theorem of calculus 
on the right, we find that 



Cancelling common factors and rearranging, we obtain the Kolmogorov backward equation, 



d_ 
di 



Fmn{r, t) = -{m\ + amr)Fmn{r, t) + mXFm+i,n{r, t). 



(40) 



However, 



d_ 
dx 



(r) = ^[Vmnix,t)]{r) = fmnir,t). 



(41) 



Plugging rFmn{r,t) = fmn{r,t) into (40), we find that the fmn{T,t) satisfy the same system of 
ordinary differential equations. 



dt 



fmn{r,t) = -{mX + amr)fmn{r,t) + mXfm+i,n{r,t). 



(42) 



These are the backward equations for the Laplace transformed reward process. To solve (42), we 



note that any solution to the forward equations is a solution to the backward equations in a birth 



process (Grimmett and Stirzaker, 2001). Therefore, (42) is equivalent to the forward system 



dfmnjr, t) 
dt 



-{nX + anr)fmn{r, t) + {n - l)Xfm,n-iir, t) 



(43) 



for n = m, m + l,m + 2, ... This completes the proof. 



□ 



B Proof of Theorem [1] 

Proof. Lemma [1] with = k for A; = 0, 1, . . . gives 



dfmnir,t) 

dt 



-n(A + r)fmn{r, t) + {n - l)Xfm,n-i{r, t). 



(44) 
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Define gmni^, s) to be the Laplace transform of fmnif, t) with respect to the time variable t. Trans- 



forming (44) gives 

sgmnir, s) - 5mn = -n{\ + r)gmn{r, s) + {n- l)Xgm,n-i{r, s). (45) 



Letting m = n, we find that 



gmm{r,s) = — ] (46) 

s + m(A + r) 



Next, we form a recurrence and solve for gmnif, s) to obtain 



(n-l)A 

gmn[r,S) = — -— -gm,n-l{r,S) 

s + n(A + r) 



(n - 1) • • • mA"-"" , , 

gm,m+i{r,s) (47) 



Ul=m+i [^ + j(A + r)] 
_ (n-1)! A"-'" 
"("^-l)!n-=™ [. + j(A + r)]- 

We proceed via a partial fractions decomposition of the product in the denominator above, 

^1 



i^-^V-^\t^^ j s + jiX + r) 



(m-1)! ^ (A + r)"-™ s + vYA + r) 

in-lV. ,n-m [(-l)^-(j-^)!(n-j)!]'^ 1 
(m-1)! ^ (A + r)"-"^ s + j(A + r) 

J — 171 

n-l\/j-l\ (-1)J~™ 1 



(48) 



^ \2 - \)\m-\) 



- ly (A + r^-^ s + j(A + r) ' 
when n> m. Inverse transforming with respect to s, we obtain 

/r„m(r,t) = e-'"(^+'-)* (49) 



and 

'n-l\/j-l\ (-l)J-'" 



]=m 
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when n > m. Again inverse transforming (50 ), this time with respect to the Laplace reward variable 
r, we find that for m = n, 

Vmm{x, t) = 5{x - mt)e-™^* (51) 
which is a point mass at x = mt. For n > m, 

\n-m -\x _ T\ / o _ 1 \ 

Vmn{x,t) = ^ . P h-iy-m^^-jtr-^-^H{x-jt). (52) 

[n-m- ly. ^ \j -IJ \m- IJ 
This completes the proof. □ 

C Proof of Theorem [2] 

Proof. Lemma [T] with arbitrary rewards o^, A: = 1, 2, . . ., gives 



dt 



{nX + anr)fmn{r, t) + (n - l)Xfm,n~i{r, t). (53) 



To solve the system, apply the Laplace transform with respect to time t. First note that the 
transform of fmmi''',t) is 

gmmir,s) = — ^- . (54) 

Transforming the nth equation, and recalling that /mn(?', 0) = for n > m, 

sgmn{r, s) - fmnir,0) = -{n\ + anr)gmn{r,s) + (n - l)Xgm,n~i{r,s) 

Omnir, s){s + nA + a„r) = (n - l)Xgm,n~i{r, s) 

_ (n- 1)A 

s + nX + anV {00) 



V n—m 



(n-1)! A- 

(m - 1)! [lj=m+iis + J^ + ajr) 
(n-1)! A"-*" 



We expand the denominator by partial fractions to find 



(m — 1)1 ^-^ s + 7A + a,r 
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Transforming back to the time domain, we have, for m = n, 



f (r — „-{mX+amr)t 



(57) 



When n > m, 



fmn{r,t) = y 



.{n-l)\ 



j=m Hfc^j 



-{jX+ajr)t 



{X{k-j) +r{ak - aj)) 



This completes the proof. 



(58) 



□ 



D Analytic and numerical inversion 



Analytic inversion of (21) in Theorem [2] is possible, but unfortunately depends on the structure of 
the tree topology in unexpected ways. One convenient property of the rewards a = (ai, . . . , a„) is 



that ai < Cj+i for all 1 < i < n — 1, a fact apparent from (12). Therefore Oj 7^ aj for distinct i and 



j. When m = n, no speciation events have taken place, and we have 



Vmm{x,t) = 5{x - amt)e 



-mXt 



(59) 



For n = m + 1 , there is only one distinct topology, so 



mXe 



-mXt 



exp ( - — ^^I^^ ) H{x - ami) 



e^* exp 



\{x - am+it) 



H{x - am+it) 



(60) 



In general, when 



CLl — dj CLjf — CLj 



for any /, k, or j in 1, ... , n, then (58) becomes 



(61) 



fmn{r,t) = X 



n-m -^)- 
(m — 1) 



D! ^ 



-(jX+ajr)t 



j=m rifc^jK - aj) + r 



(n- 1)! 



-{jX+ajr)t 



\n-m V" X ^ ■y ^ e=/=j 



A Ok - aj) 



E 



A(^-i) _ A(fc-j) 



Kk-3) 



df^ — a 



+ r 



(62) 
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and so the full probability density for n > m is 



Vmn[X,t)-A 1)! n- ■ -"-^ ^ 



(m 



-a,- a^—aj 



(63) 



However, the rewards a for many topologies do not satisfy (61). This can be seen in Figure [7| 



where m = 2 and n = 4. Then we see that when j = 2, k = 4, and ^ = 3 in (63), 



04 — a2 0.75 — 0.5 



(64) 



and 



03 - 02 0.75 - 0.625 



(65) 



so the denominator in the second sum in (63) is zero. Unfortunately this happens whenever there 



is symmetry in the tree so that more than one pair of taxa have the same time of shared ancestry. 
Note also that (63) does not reduce to (21) in Theorem [2] when ak = k since the denominator in 



the summand of (63) is zero. 



[Figure 7 about here.] 



Despite the difficulty in writing a general inversion to obtain fmn{x, t) for any topology, numer- 



ical inversion to arbitrary precision remains straightforward. Abate and Whitt (1995) describe a 



numerical method for inverting the Laplace transform of probability densities by a discrete Riemann 
sum using the trapezoidal rule with step size h: 



eA/2 

Vmn 



where we choose A = 20. 



f i-t 
' 2x 



k=l 



A + 2A:7ri 

Jm,n \ 2^ ' 



(66) 
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Figure 1: Example of a Yule (pure-birth) tree with the corresponding counting process Y{t) below. 
The birth rate in this example is A = 2. In the counting process representation of this realization, 
we only keep track of the total number of species in existence at each time. 



30 







Figure 2: Probability densities of phylogenetic diversity (PD), or total branch length, under the 
Yule process starting at 1^(0) = 1, ending at Y(t) = n for n = 1, . . . , 8, with t = 1 and A = 1.2. 
When n = 1, no births have occurred, so the accrued PD must be exactly x = t = 1, which we 
represent here as a point mass at 1. For n = 2, the minimum accumulated PD is one, since the 
process spent at most one unit of time with one species; likewise the maximum accumulated PD is 
two, since the process spent at most one unit of time with two species. The functional form of ([7|) 
reveals the piecewise nature of the density, which gradually becomes smoother as n becomes large. 
The vertical probability axis is the same for all plots. 
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Figure 3: How tree topology determines the matrix of Brownian covariances and expected disparity. 
At left, a tree topology r of crown age t has 5 taxa and branch times t = (t2) ^3; ^4); where tk is the 
time of the branch from k io k + 1 lineages. At right, the corresponding matrix C(r, t) of Brownian 
covariances. The diagonal entries of C(r, t) are all t. The (i,j)th entry of C(t, t) is the time of 
shared ancestry between taxa i and j, for i ^ j. For example, taxa 1 and 2 share ancestry for time 
t4. Expected disparity (using Brownian variance cr^) is calculated using (12). Trait disparity cannot 
accumulate when there is only one species, so we draw the tree r beginning with two lineages at 
time 0. 
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T 1 I 1 1 I r 



0.0 0.5 1.0 0.0 0.5 1.0 0.0 0.5 1.0 

Figure 4: Empirical correspondence between the derived expressions for the distribution of expected 
disparity and simulated mean disparity histograms for trees with different numbers of taxa. For 
each n = 3, . . . , 8, we simulated a single tree topology (shown above each histogram in gray). We 
then simulated 2000 sets of branch lengths for this topology under the Yule process. For each set 
of branch lengths, we calculated the mean disparity from 2000 simulations of zero-mean Brownian 
motion with variance cr^ = 1 on this tree. 
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Figure 5: Empirical consistency of approximate maximum likelihood estimates of . The number 
of species n in the unobserved phylogenetic tree is shown on the horizontal axis. Each set of plots 
shows 100 estimates of from A^j^timcs simulations of different branching times under a Yule process 
with n species and from A^bm independent realizations of Brownian motion used to compute the 
mean disparity for each set of branching times. The estimates are jittered to show the sampling 
distribution. The gray dotted line shows the true value cr^ = 1. 
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Figure 6: A family-level phylogenetic tree for order Carnivora. The phylogeny within each family, 
shown here as a gray triangle, is not known with certainty. The length of the base of the gray 
triangles represents the number of species in the family. The "backbone" tree connecting the 
unresolved clades is assumed fixed. Branch lengths are shown along each branch. 
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Figure 7: Demonstration of a problematic reward vector a computed for the symmetric four-taxon 



tree. In this case, the analytic inversion formula (63) cannot be applied, since the denominator in 
the sum becomes zero. 
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Family 


Richness 


TMRCA 


Disparity 




a'' 


SE 


Prionodontidae 


2 


33, 


,30 


0, 


,131 


0, 


.051 







Felidae 


40 


33, 


,30 


1, 


,588 


0, 


.118 


0, 


,094 


Viverridae 


34 


37, 


.40 


0, 


,606 


0, 


.034 


0, 


,018 


Herpestidae 


33 


25, 


.50 


0, 


,482 


0, 


.035 


0, 


.017 


Eupleridae 


8 


25, 


.50 


0, 


,916 


0, 


.081 


0, 


.010 


Hyaenidae 


4 


32, 


.20 


0, 


,805 


0, 


.084 


0, 


,001 


Canidae 


35 


48, 


.90 


0, 


,678 


0, 


.031 


0, 


,012 


Ursidae 


8 


42, 


.60 


0, 


.303 


0, 


.025 


0, 


.002 


Otariidae 


16 


24, 


.50 


0, 


.386 


0, 


.029 


0, 


.005 


Phocidae 


19 


24, 


,50 


0, 


.751 


0, 


.065 


0, 


.030 


Mephitidae 


12 


32, 


,00 


0, 


.570 


0, 


.038 


0, 


.004 


Mustelidae 


59 


27, 


.40 


2, 


.263 


0, 


.220 


0, 


.239 


Procyonidae 


14 


27, 


.40 


0, 


.531 


0, 


.038 


0, 


.006 



Table 1: Species richness, TMRCA, body size disparity, and estimated Brownian variance cr^ for 

each family in the Carnivora dataset. Note that Canidae and Herpestidae have very different dis- 
parity measurements, but nearly identical estimates of ct^. This discrepancy is due to the difference 
in their ages; we explain the interaction between time, species number and disparity in greater 
detail in the text. 
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